The United Nations (UN) is an international organisation that is committed to maintaining international peace and security, promoting social progress, and human rights (United Nations, 2017). The organisation actively promotes educational and economic development as key elements of its sustainable development goals. https://www.un.org/en
Statistical analysis reveals a robust correlation between educational attainment and economic prosperity, evidenced by higher Gross National Income (GNI) per capita with increased schooling years.
This report recommends that the United Nations intensifies efforts to extend mean years of schooling globally, particularly in underperforming regions. Such a strategy not only aligns with the UN’s Sustainable Development Goal 4 and 8 (THE 17 GOALS | Sustainable Development, 2015) but also promises significant economic uplift.
Implementing this recommendation will advance global economic
stability and reduce inequalities, reinforcing the UN’s commitment to
sustainable development.
The dataset used is titled “Average global IQ per country with other stats”. It was collected and formatted by Matheus Felipe on Kaggle (https://www.kaggle.com/datasets/mlippo/average-global-iq-per-country-with-other-stats?resource=download)
There are many rows with missing data in mean years of schooling and GNI which may lead to inaccurate and non-inclusive results.
The dataset consists of 193 rows, each row represents a country or small territory. There are 10 columns of different variables:
Rank
Country
Average IQ
Continent
Literacy Rate
Nobel Prizes
HDI
Mean years of schooling: Mean years of education that a country’s citizens receive.
GNI: Gross National Income of that country.
Population
Command read_csv classifies the columns when the dataset was imported. The column labels were changed from question format to short and precise names for convenience and reusability.
options(repos = c(CRAN = "https://cloud.r-project.org/"))
data = read.csv("~/Desktop/DATA1001/avgIQpercountry.csv")
colnames(data) = c(
"rank",
"country",
"averageiq",
"continent",
"literacyrate",
"nobelprizes",
"hdi",
"meanschoolyears",
"gni",
"population"
)
# Quick look at top 5 rows of data
head(data)
## rank country averageiq continent literacyrate nobelprizes hdi
## 1 1 Japan 106.48 Asia 0.99 29 0.925
## 2 2 Taiwan 106.47 Asia 0.96 4 NA
## 3 3 Singapore 105.89 Asia 0.97 0 0.939
## 4 4 Hong Kong 105.37 Asia 0.94 1 0.952
## 5 5 China 104.10 Asia 0.96 8 0.768
## 6 6 South Korea 102.35 Asia 0.98 0 0.925
## meanschoolyears gni population
## 1 13.4 42274 123294513
## 2 NA NA 10143543
## 3 11.9 90919 6014723
## 4 12.2 62607 7491609
## 5 7.6 17504 1425671352
## 6 12.5 44501 51784059
## Size of data
dim(data)
## [1] 193 10
## R's classification of data
class(data)
## [1] "data.frame"
## R's classification of variables
str(data)
## 'data.frame': 193 obs. of 10 variables:
## $ rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ country : chr "Japan" "Taiwan" "Singapore" "Hong Kong" ...
## $ averageiq : num 106 106 106 105 104 ...
## $ continent : chr "Asia" "Asia" "Asia" "Asia" ...
## $ literacyrate : num 0.99 0.96 0.97 0.94 0.96 0.98 1 1 1 0.99 ...
## $ nobelprizes : int 29 4 0 1 8 0 2 5 0 111 ...
## $ hdi : num 0.925 NA 0.939 0.952 0.768 0.925 0.808 0.94 0.935 0.942 ...
## $ meanschoolyears: num 13.4 NA 11.9 12.2 7.6 12.5 12.1 12.9 12.5 14.1 ...
## $ gni : int 42274 NA 90919 62607 17504 44501 18849 49452 146830 54534 ...
## $ population : chr "123294513" "10143543" "6014723" "7491609" ...
According to Marquez-Ramos, education is important as it affects the country’s economic growth (Marquez-Ramos). From the IQ dataset, education is quantified as mean years of schooling and economic growth is measured by Gross National Income (GNI).
To clarify how mean years of schooling impacts economic outcomes in varied contexts, countries are divided into lower and higher GNI groups based on median GNI, reducing variability within groups and enabling more precise comparisons.
median_gni <- median(data$gni, na.rm = TRUE)
lower_gni_group <- data[data$gni <= median_gni,]
higher_gni_group <- data[data$gni > median_gni,]
In order to ensure that a relationship between education and economic growth exists and visualise such a relationship, scatter plots are used for both lower GNI and higher GNI groups.
library(tidyverse)
ggplot(lower_gni_group, aes(x = meanschoolyears, y = gni)) +
geom_point() +
geom_smooth(method="lm", colour = "#1b95e0", se = FALSE) +
theme_classic() +
ggtitle("Relationship of Mean School Years and GNI of countries with lower GNI") +
xlab("Country's Mean Years of School") +
ylab("Country's Gross National Income (GNI)")
library(tidyverse)
ggplot(higher_gni_group, aes(x = meanschoolyears, y = gni)) +
geom_point() +
geom_smooth(method="lm", colour = "#1b95e0", se = FALSE) +
theme_classic() +
ggtitle("Relationship of Mean School Years and GNI of countries with higher GNI") +
xlab("Country's Mean Years of School") +
ylab("Country's Gross National Income (GNI)")
The two scatter plots demonstrate that as the countries’ mean years of school increases, their GNI also increases, therefore, there is a positive relationship between the two variables. This is consistent with current research showing that as individuals receive more education, their income tends to increase (Clarke, 2022), which in turn aligns with the recommendation that higher education levels can lead to a greater country’s GNI.
After a visible relationship between mean years of school and GNI has formed, a linear model is used to quantitatively measure this relationship, providing precise estimates and enabling statistical testing for significance.
low <- lm(gni ~ meanschoolyears, data = lower_gni_group)
summary(low)
##
## Call:
## lm(formula = gni ~ meanschoolyears, data = lower_gni_group)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5480.1 -1986.6 -188.3 2025.1 5302.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -822.5 775.2 -1.061 0.292
## meanschoolyears 953.4 107.6 8.863 7.87e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2626 on 88 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.4716, Adjusted R-squared: 0.4656
## F-statistic: 78.55 on 1 and 88 DF, p-value: 7.867e-14
high <- lm(gni ~ meanschoolyears, data = higher_gni_group)
summary(high)
##
## Call:
## lm(formula = gni ~ meanschoolyears, data = higher_gni_group)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29554 -13388 -4527 6654 104273
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -26650 14640 -1.820 0.0721 .
## meanschoolyears 5537 1275 4.341 3.81e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20470 on 87 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.178, Adjusted R-squared: 0.1686
## F-statistic: 18.85 on 1 and 87 DF, p-value: 3.813e-05
Both p-values from the linear regression models of the lower and higher GNI groups are below 0.05, indicating a statistically significant relationship between mean years of schooling and GNI in both economic tiers, confirming that education positively impacts economic performance across different levels of national income.
Creating a bar plot to compare mean of mean school years by GNI levels provides a clear, visual method for highlighting educational disparities between lower and higher GNI countries.
mean_school_years_low <- mean(lower_gni_group$meanschoolyears, na.rm = TRUE)
mean_school_years_high <- mean(higher_gni_group$meanschoolyears, na.rm = TRUE)
group_means <- data.frame(
gni_level = c("Low", "High"),
mean_education_level = c(mean_school_years_low, mean_school_years_high)
)
library(ggplot2)
ggplot(group_means, aes(x = gni_level, y = mean_education_level, fill = gni_level)) +
geom_bar(stat = "identity", position = position_dodge(), show.legend = FALSE) +
scale_fill_manual(values = c("Low" = "darkblue", "High" = "darkgreen")) +
theme_minimal() +
labs(
title = "Comparison of Mean of Mean School Years by GNI Level",
x = "GNI Level",
y = "Mean School Years"
)
The bar plot shows that higher GNI countries tend to have more years of schooling. According to Vegas, wealthier nations invest more in education which suggests that increasing educational opportunities in lower GNI countries could be a key strategy for economic development (Vegas, 2020).
H0 = There is no difference in the mean years of schooling between the lower GNI group and the higher GNI group.
H1 = There is a difference in the mean years of schooling between the lower GNI group and the higher GNI group.
A Welch Two Sample t-test is performed on the mean school years of lower GNI group and higher GNI group. As the p-value is smaller than 0.05, there is sufficient evidence to reject the null hypothesis. Therefore, there is a difference in mean years of schooling between the two groups of GNI.
t.test(lower_gni_group$meanschoolyears, higher_gni_group$meanschoolyears, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: lower_gni_group$meanschoolyears and higher_gni_group$meanschoolyears
## t = -14.106, df = 154.54, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.267637 -3.973512
## sample estimates:
## mean of x mean of y
## 6.731111 11.351685
United Nations (2017). About Us | United Nations. United Nations. https://www.un.org/en/about-us
THE 17 GOALS | Sustainable Development. (2015). Un.org. https://sdgs.un.org/goals
Marquez-Ramos, L., & Mourelle, E. (2019). Education and economic growth: an empirical analysis of nonlinearities. Applied Economic Analysis, 27(79), 21–45. https://doi.org/10.1108/aea-06-2019-0005
Clarke, M. (2022). Income - Department of Education, Australian Government. Department of Education. https://www.education.gov.au/integrated-data-research/benefits-educational-attainment/income
Vegas, E. (2020, June 19). Investing in public education worldwide is
now more important than ever. Brookings. https://www.brookings.edu/articles/investing-in-public-education-worldwide-is-now-more-important-than-ever/
United Nations is chosen as the client for this project’s recommendation because of its global influence and commitment to promoting quality education.
Independence: The samples are all independent as each country is only used once.
This is done in section 3.4
Scatter plots: Linearity of scatter plots can be found in section 2.2.
Residual plots: The points on both residual plots appear to randomly scatter around the horizontal axis which demonstrates homoscedastic.
Thus, the two assumptions are met for the linear regression models that are done in section 2.2.
library(ggplot2)
model_low <- lm(gni ~ meanschoolyears, data = lower_gni_group)
residuals_df <- data.frame(
resid = resid(model_low),
fitted = fitted(model_low)
)
ggplot(residuals_df, aes(x = fitted, y = resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(x = "Fitted Values", y = "Residuals", title = "Residual vs. Fitted Plot for Lower GNI group") +
theme_minimal()
library(ggplot2)
model_high <- lm(gni ~ meanschoolyears, data = higher_gni_group)
residuals_df <- data.frame(
resid = resid(model_high),
fitted = fitted(model_high)
)
ggplot(residuals_df, aes(x = fitted, y = resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(x = "Fitted Values", y = "Residuals", title = "Residual vs. Fitted Plot for Higher GNI group") +
theme_minimal()
Normality:
Box plot: The comparative box plots display symmetrical distribution and absence of significant skewness or outliers. This suggests that the data conforms well to a normal distribution.
QQ plots: The QQ plots show a straight line, confirming that the data points closely adhere to a normal distribution.
Normality Check: As both sample size are both larger than 30, the Central Limit Theorem ensures the sample means are approximately normal.
qqnorm(lower_gni_group$meanschoolyears); qqline(lower_gni_group$meanschoolyears)
qqnorm(higher_gni_group$meanschoolyears); qqline(higher_gni_group$meanschoolyears)
boxplot(lower_gni_group$meanschoolyears, higher_gni_group$meanschoolyears, names=c("Lower GNI", "Higher GNI"), main="Mean School Years Comparative Boxplots")
paste('Sample size of lower GNI group:', sum(complete.cases(lower_gni_group)))
## [1] "Sample size of lower GNI group: 90"
paste('Sample size of higher GNI group:', sum(complete.cases(higher_gni_group)))
## [1] "Sample size of higher GNI group: 89"
Equal Spread:
Variance Test: As p-value is smaller than 0.05, the null hypothesis is rejected, the variances of the groups are not equal. Thus, instead of 2 Sample T-test, Welch Two Sample t-test is used in section 2.3.
H0: The variances of the groups are equal
H1: The variances of the groups are not equal
var.test(lower_gni_group$meanschoolyears, higher_gni_group$meanschoolyears)
##
## F test to compare two variances
##
## data: lower_gni_group$meanschoolyears and higher_gni_group$meanschoolyears
## F = 2.2875, num df = 89, denom df = 88, p-value = 0.000131
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 1.503518 3.478428
## sample estimates:
## ratio of variances
## 2.287468
With a t-value of -14.106 and a p-value < 2.2e-16, there is a statistically significant difference between mean school years of lower GNI and higher GNI groups.
Statistical conclusion: As p-value < 0.05, the null hypothesis is rejected. There is a difference in the mean years of schooling between the lower GNI group and the higher GNI group
Scientific conclusion: The data suggests that a country’s mean years of school does have an effect on its GNI.